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I, Ned Wingreen, Ph.D. declare as follows: 

I am a Professor in the Department of Molecular Biology at 
Princeton University in Princeton, New Jersey. Immediately 
prior to that, I was Senior Research Staff Member at NEC 
Laboratories America, Inc., Princeton, New Jersey, the Assignee 
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identified patent application, and I participated in the 
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application and its offspring divisional applications. The 
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in identifying a novel, stable fold into which a novel sequence 
of amino acids can be configured. 

Using the method of Miller et al., which is also the method disclosed and claimed in the present 
application, we identified a small, highly designable protein fold that did not appear as a stand- 
alone fold in the Protein Data Bank (PDB). [Miller, et al., Emergence of Highly Designable 
Protein-Backbone Conformations in an Off-Lattice Model. Proteins 47, 506-512 (2002), copy 
attached as Exhibit B.] For the particular analysis described below, the method consisted of the 
following steps: (1) generating backbone configurations of a preselected length n by complete 
enumeration using a set of three dihedral angle pairs, (2) assigning a sphere of radius 1.9 A to the 
beta carbon position of each residue, (3) eliminating configurations for which any of these 
spheres overlapped, (4) evaluating the surface exposure of each sphere in each remaining 
configuration, and eliminating all but the - 10,000 configurations with the lowest total surface 
exposure, (5) normalizing the surface exposure of the spheres in each remaining configuration, 
(6) generating sequences of hydrophobicities hi (= 0 or 1) of the same length as each of the 
remaining configurations, (7) determining for each sequence of hydrophobicities which of the 
remaining configurations was the ground state, (8) identifying those configurations which were 
ground states of the largest number of sequences of hydrophobicities, and (9) determining which 
of these configurations were novel, i.e. did not have a close match in the PDB. 

By following the above steps we winnowed down the number of protein backbone 
configurations which merited consideration for design. First, the very large number of 
configurations generated from all possible combinations of the three dihedral angle pairs (3 n ) 
was reduced to -10,000 by considerations of self-overlap and compactness (see steps (1) to (4) 
above). Second, the remaining -10,000 configurations were organized in a list starting from the 
configuration that was the ground state of the largest number of sequences. Configurations not 
falling near the top of this list (~ top 100) were considered unpromising for purposes of design. 
Finally, the top folds were tested for novelty by comparison with known protein backbone 
configurations in the PDB. 
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One fold identified in this way became our target for synthesis. In terms of secondary structural 
elements, the fold consisted of a beta strand followed by an alpha helix, followed by a second 
beta strand. The beta strands folded over the alpha helix creating a two-stranded beta sheet as 
shown in Fig. 1. 



Figure 1. Ribbon diagram of beta-alpha-beta fold. The beta strands (yellow) form a beta sheet on top of the 
alpha helix (magenta). 
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A specific amino-acid sequence, of length 33 residues, was designed to adopt the desired 
backbone configuration, based on standard considerations of packing and solvent exposure. The 
designed sequence is KRRTITLGGGEERIKKYREAFKNGNTEVTFQGQ, using the single-letter code 
for amino acids. The predicted configuration of this sequence folded into the beta-alpha-beta 
configuration is shown in Fig. 2. 



Figure 2. Predicted structure of sequence designed to adopt beta-alpha-beta fold. The detailed backbone and 
sidechain configurations for the 33-residue sequence are shown with nitrogens indicated in blue and oxygens in red. 



NEWYORK 4785360 v2 (2K) 



-4- 



Serial No.: 09/730,214 
Filed: December 5, 2000 
Docket No. 1125722-0005 



The designed protein sequence of 33 residues was synthesized chemically and subjected to 
various analyses. First, the protein proved to be highly soluble in water, which allowed for 
standard biophysical tests. Specifically, the circular dichroism (CD) spectrum was obtained and 
analyzed (Fig. 3). The measured spectrum corresponds to an alpha helical content of 28% and a 
beta-strand content of 20%, which compare very favorably with the predicted values of 30% and 
18%, respectively. 




Figure 3. Measured circular dichroism (CD) spectrum of designed 33-residue sequence. The amplitudes of the 
characteristic features in the CD spectrum correspond to a folded structure consisting of 28% alpha helix and 20% 
beta strand. 
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The specific heat of thermal denaturation was also measured (Fig. 4) and found to be consistent 
with two-state folding, Le. a direct transition between an ensemble of unfolded configurations 
and a single folded configuration with decreasing temperature. 



-1800 




T/°C 



Figure 4. Measured specific heat of thermal denaturation of designed 33-residue sequence. The thermal 
denaturation curve (black) can be fitted extremely well by theoretical curve corresponding to two-state folding, 
consistent with the existence of a single well-folded configuration. 
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A 1D-NMR spectrum was obtained for the designed 33-residue sequence in solution (Fig. 5). 
The clear resolution of the peaks provided critical evidence that the designed sequence was 
indeed folding into a single unique structure. The peak widths proved somewhat too broad to 
allow reconstruction of the three-dimensional structure by 2D-NMR. 
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Figure 5. 1D-NMR spectrum of the designed 33-residue sequence. The multiple peaks are consistent with a 
single folded structure. 



Together, the 1D-NMR spectrum and the CD spectrum provide strong evidence that not only is 
the designed sequence folding into a unique and stable structure, but that the unique structure is 
the target beta-alpha-beta fold. 



I hereby declare that all statements made herein of my own 
knowledge are true and that all statements made on information 
and belief are believed to be true and further, that these 
statements were made with the knowledge that willful false 
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imprisonment, or both, under Section 1001 of Title 18 of the 
United States Code and that such willful false statements may 
jeopardize the validity of the application or any patent issued 




thereon. 




Ned Wingreen, Ph.D. 
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Emergence of Highly Designable Protein-Backbone 
Conformations in an Off-Lattice Model 

Jonathan Miller, Chen Zeng, Ned S. Wingreen, and Chao Tang* 

NEC Research Institute, Princeton, New Jersey 



ABSTRACT Despite the variety of protein sizes, 
shapes, and backbone configurations found in na- 
ture, the design of novel protein folds remains an 
open problem. Within simple lattice models it has 
been shown that all structures are not equally suit- 
able for design. Rather, certain structures are distin- 
guished by unusually high designability: the num- 
ber of amino acid sequences for which they represent 
the unique lowest energy state; sequences associ- 
ated with such structures possess both robustness 
to mutation and thermodynamic stability. Here we 
report that highly designable backbone conforma- 
tions also emerge in a realistic off-lattice model. The 
highly designable conformations of a chain of 23 
amino acids are identified and found to be remark- 
ably insensitive to model parameters. Although some 
of these conformations correspond closely to known 
natural protein folds, such as the zinc finger and the 
helix-turn-helix motifs, others do not resemble 
known folds and may be candidates for novel fold 
design. Proteins 2002;47:506-512. 
© 2002 Wiley-Liss, Inc. 

Key words: protein folds; off-lattice model; design- 
ability; protein design; evolution 

INTRODUCTION 

The de novo design of proteins — an object of enormous 
activity in recent years — has so far dealt primarily with 
the redesign of known protein folds. 1-8 Two major accom- 
plishments in the direction of designing a fold that is 
distinct from known natural folds are the synthesis of a 
right-handed coiled coil 9 and the synthesis of a zinc finger 
without zinc. 10 " 12 To challenge the best efforts of de novo 
design, nature offers roughly 1000 qualitatively distinct 
protein folds. 13 Why has it proven difficult to design new 
protein folds? What program should we follow to achieve 
ab initio design of novel folds? 

The principle of designability 14 ~ 19 offers an answer to 
both these questions for simple lattice models. The design- 
ability of a structure is measured by the number of 
sequences that design it, that is, the number of sequences 
that have the given structure as their unique lowest 
energy conformation. Structures can differ vastly in their 
designability, 14 and it has been shown that high designabil- 
ity entails other protein-like properties, such as muta- 
tional stability, thermodynamic stability, 14,15 and fast 
folding kinetics. 16,20 Design is hard in the sense that most 
structures have low designability and their associated 



sequences lack these protein-like properties. For success- 
ful de novo design, one should first identify the few highly 
designable structures. 

It is an open question whether designability applies to 
real proteins as it does to lattice polymers. Real protein 
structures have a degree of complexity that cannot be 
effectively represented within a simple lattice model. For 
example, on a lattice the angles between bonds differ from 
those naturally adopted in real proteins. In addition, 
although in a cubic-lattice model the cube minimizes 
surface area for a given volume and is perfectly packed, no 
counterpart of the perfect cube exists once the lattice is 
removed. For designability to guide practical design of new 
folds it must apply to realistic descriptions of protein 
structure. 

In this article we report the computation of designability 
within an off-lattice model that incorporates angles fa- 
vored by natural proteins, for protein chains of up to N = 
23 amino acids. We find that the essential qualitative 
features of designability survive the transition from lattice 
model to off-lattice model. In particular, it remains true 
that a small fraction of compact structures are highly 
designable: these are nondegenerate ground states for an 
enormous number of amino acid sequences. Most struc- 
tures, on the other hand, are ground states for few, if any, 
amino acid sequences. Furthermore, the sequences that 
fold into highly designable structures typically have en- 
hanced thermodynamic stability — the energy of the near- 
est excited state is separated from the ground-state energy 
by an appreciable gap. 

MODELS AND METHODS 

The model we adopt is closely related to the off-lattice, 
m-state discrete-angle model introduced by Park and 
Levitt. 21 Each configuration is denned by a sequence of C a 
bonds of length 3.8 A, and each pair of dihedral angles (<(>, 
i|0 is restricted to one of only m alternatives; here we take 
m — 3. The set of m allowed angle pairs is chosen by fitting 
to the backbone coordinates of representative natural 
proteins, 21 as discussed below. To suppress self-intersec- 
tions of the chain, we augment the model by introducing a 
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Fig. 1. a-c: Backbone configurations of 1st, 4th, and 15th most 
designable 23-mer structures, d: Backbone configuration of the zinc 
finger 1 PSV, 12 truncated to 23 amino acids. 



volume for the amino acid residues in the form of a sphere 
of radius r p centered on C p (the first carbon of the 
side-chain). The backbones of some configurations con- 
structed in this fashion are shown in Fig. l(a-c). 

This off-lattice model incorporates properties of real 
polymers not well reproduced in simple lattice models. On 
the lattice, for example, allowed ground-state structures 
were limited to those maximally compact structures that 
fill the unique rectangle or box of minimum surface area. 
Off the lattice, every structure can be expected to have a 
distinct surface area. However, open or extended struc- 
tures are not expected to be designable. We entertain as 
plausible ground-state structures only those with a sur- 
face area below some cutoff value A c , which enters our 
computation as a parameter.* 

Because a discrete angle set represents only a crude 
approximation to a continuum of angles, it is unrealistic to 
expect the surface area of a discrete-angle structure to 
faithfully reproduce the surface area of a structure built 
from more flexible angles. Importantly, using flexible 
angles would allow our more open structures (e.g., those 
just below the cutoff A c ) to contract and reduce their 
exposed surface areas. To achieve this equalizing effect of 
a continuum of angles within the limitations of a discrete- 
angle model, we normalize the vector of solvent-accessible 
surface areas A = (a lf . . . , a N )> where a t is the solvent- 
accessible surface area of the f-th residue, in such a way as 
to preserve the pattern of surface exposure along a chain. 



A suitable procedure 1 " is to normalize the vector A for each 
structure by the total exposed surface area of that struc- 
ture: A = Aflpi = (a lf . . . , d N ). This procedure treats all 
structures below the cutoff A c as equally compact while 
preserving each structure's individual pattern of surface 
exposure along the chain. 

As with real proteins, description and comparison of 
configurations off-lattice demands precision about what 
we mean by the term "structure." For example, a protein 
structure obtained by NMR represents an ensemble of 
configurations, no element of which necessarily provides a 
better fit to the data than any other. This ensemble 
presumably reproduces the temperature-induced fluctua- 
tions of a natural protein around its native state. On 
averaging over this ensemble for small stably folded 
polypeptides in the PDB database, one finds a typical 
center-of-mass root mean square (crms) of roughly 0.3-0.5 
A per residue, A similar range of crms can be inferred from 
the B values of protein crystals. 23 Accordingly, our off- 
lattice polymer configurations are grouped into clusters 
consisting of all configurations lying within a crms dis- 
tance \ per residue of one another. Configurations within a 
cluster are to be thought of as variations of a single 
structure, and subsequently we will refer to clusters and 
structures interchangeably. 

We define the designability of a structure as the sum of 
the designabilities of its included configurations. The 
designability of a configuration is simply the number of 
sequences with that configuration as a unique ground 
state. 14,15 To evaluate the energy of a sequence on each 
configuration, we associate a hydrophobicity h t with each 
amino acid of the sequence. In practice, we assign a 
hydrophobicity which is either 0 (Polar) or 1 (Hydrophobic) 
to each monomer to create an HP-sequence 24 ; that this is a 
reasonable simplification finds support in the work of 
Beasley and Hecht 1 [cf. Fig. 3(e) for the results of a more 
general choice] . The energy of a particular sequence folded 
into a particular configuration is obtained by taking the 
sum of the products of each amino acid's hydrophobicity /i,- 
with its normalized surface exposure d i} 

£=2>w- a) 

We numerically evaluate the energy of all HP-sequences 
for all configurations. 

Except as indicated explicitly in the text, we choose 
discrete angles and the amino acid radius to optimize the 
fit to the backbone of the zinc-less synthetic zinc finger 12 
1PSV [Fig. 1(d)] . We find that there are many angle sets 
that fit the backbone of 1PSV almost equally well. For 
example, the crms per residue between 1PSV and the 
structure obtained from each of our 10 best angle sets 
varies from 0.844 to 0.913 A. The angle set we use for most 



*We evaluate the area of each C p sphere accessible to a probe sphere 
of radius 1.4 A, by the methods used in the program SERF, the 
slightly different values of surface area obtained by different methods 
do not in any way alter the outcome of the calculations. 



have checked that certain alternative normalizations (e.g., 
normalizing by the total solvent-inaccessible surface area) do not alter 
the set of highly designable structures that emerge from our calcula- 
tion. With no normalization, higher designability becomes closely 
correlated with lower solvent-accessible surface area. 
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Fig. 2. Histogram of designabilities of 23-mer structures, using r p = 
1.9 A. The surface area cutoff A c is such that 10,000 configurations 
participate in the calculation, grouped into 4688 clusters with cluster 
radius \ = 0.4 A. 



of the calculations presented in this article is (<t>, +) = 
(-95°, 135°), (-75°, -25°), and (-55°, -55°). The first pah- 
lies in the p-region of the Ramachandran plot, and the 
other two pairs lie in the a-region. We take r p = 1.9 A, the 
radius above which the amino acids fit to the backbone of 
1PSV would clash. 

RESULTS 

The designability of a structure denotes the number of 
distinct HP-sequences having that structure as their unique 
ground state. The distribution of designabilities for our 
model, displayed in Figure 2, reproduces a crucial feature 
first observed on the lattice: although most structures 
have very low designability, the trailing edge (or tail) of 
the distribution consists of a small number of structures of 
very high designability. Thus, designability distinguishes 
a small subset of structures from generic ones. 

It turns out that the identities of these highly designable 
structures depend only weakly on the values of the param- 
eters that enter our calculation: the surface area cutoff A c , 
clustering radius X, side-chain radius r p , the set of allowed 
dihedral angles, and the range of amino acid hydrophobici- 
ties. More specifically, a significant fraction of structures 
identified as highly designable for one set of parameter 
values remains highly designable when these parameters 
are varied. We provide evidence for this important observa- 
tion in the next five subsections. 

Surface Area Cutoff 

As discussed before, open structures are expected to 
exhibit low designability. We anticipate that the highly 
designable structures of interest to us will fall mainly 
within the class of compact structures; therefore, only 
these compact structures are needed in our calculation. 
The surface area cutoff A c determines how compact a 
structure must be to qualify. We expect that, provided the 
choice of A c is not too restrictive, its particular value ought 
not to be important. 



A computationally practical choice of the surface-area 
cutoff eliminates most of the less compact configurations. 
A few of these might have proven highly designable if 
retained; however, our objective is not to find all highly 
designable structures, but only to identify some of them. 
Therefore, our major concern is not that we might incor- 
rectly discard a few designable structures, but rather that 
we might produce false positives (structures that appear to 
be highly designable with a restrictive value of the cutoff 
but have low designability for a more relaxed cutoff). A 
larger cutoff admits previously disallowed configurations 
that "steal" some sequences from a configuration originally 
identified as highly designable, thereby reducing its design- 
ability. 

In practice, as shown in Figure 3(a), highly designable 
structures tend to remain highly designable with increas- 
ing surface-area cutoff. For example, 9 of the 10 most 
designable structures remain within the 100 most design- 
able even after the surface-area cutoff is relaxed suffi- 
ciently to admit a 10-fold increase in the number of 
participating structures. 

Clustering Radius 

As discussed in the previous section, structures whose 
backbones differ insignificantly from one another ought 
not to be considered distinct. This observation is embodied 
in our calculation by grouping into clusters those struc- 
tures whose backbone configurations he within a certain 
crms distance, X, of one another. Varying the clustering 
radius, X, leaves unchanged the set of configurations that 
participate in the calculation. For X ^ 0.1 A, nearly every 
cluster consists of a unique configuration. To exhibit the 
dependence of the most designable structures on X, we fix a 
configuration and follow the designability of the cluster to 
which that configuration belongs, as a function of X. As 
shown in Figure 3(b), the most designable structures 
remain roughly the same as X is varied over a wide range. 

Side-Chain Radius 

Excluded volume is incorporated by means of a hard 
sphere of radius r p centered on the p-carbon of each amino 
acid. Increasing the side-chain radius r 3 eliminates some 
configurations because of steric clashes, whereas decreas- 
ing r p admits previously ineligible configurations. Starting 
at r p = 1.9 A, we identify the most designable structures 
and then count the fraction of these structures that remain 
highly designable as r p is reduced. As shown in Figure 3(c), 
the identities of the most designable structures are well 
preserved. 

Set of Dihedral Angles 

Next, we address to what extent an outcome depends on 
a particular choice of the discrete set of dihedral angles. A 
discrete set of angles cannot sample the structure space 
fully and so cannot "hit" all possible structures. On the 
other hand, we know that the designability of a structure 
depends on the local density of solvent-exposure vectors A 
with highly designable structures occupying the lowest 
density regions. 15 If the subset of structures sampled by a 
discrete set of angles reasonably preserves density in the 
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Fig. 3. Sensitivity to parameter changes of the most designable structures from Figure 2. a: Fraction of the 
10, 20, 40, or 60 most designable structures that remain in the 100 most designable as the surface-area cutoff 
increases. The initial cutoff A c is chosen so that only the 1000 most compact configurations participate and A c 
increases until 10,000 configurations participate, b: Fraction of the 10, 20, 30, or 40 most designable structures 
that remain in the 50 most designable as the clustering radius \ is increased. The 5000 most compact 
configurations participate in the calculation and r p - 1 .9 A. c: Fraction of the 10, 20, 40, or 60 most designable 
structures that remain in the 100 most designable as the side-chain radius r p is changed. We have chosen the 
surface area cutoff so that 5000 structures participate in the designability calculation for r p = 1.9 A. If some 
configurations of the original most designable structures are not among the 5000 most compact configurations 
for some smaller r p , we nevertheless retain them in the calculation. The clustering radius is \ = 0.4 A. d: 
Fraction of the 10, 40, 70, or 100 most designable structures that remain in the 100 most designable as 
configurations from other angle sets are added. The values of the five angle sets are as follows set #1 = (-95°, 
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145°), (-85°, -15°), (-75°, -35°); set #4 = (-105°, 145°), (-85°, -35°), (-85°, -5°); set #5 - (-105°, 
145°), (-85°, -35°), (-85°, -15°). e: Designability of structures obtained from 4,000,000 randomly generated 
sequences of real numbers in [0,1] versus designability from enumeration of HP-sequences. The 10000 most 
compact configurations participate in the calculation, X - 0.4 A, and r p = 1 .9 A. (Note: the suppressed zeros in 
panels a, b, and d.) 



space of structures, highly designable structures should 
remain highly designable as we improve our sampling of 
structure space. 

To examine this possibility, we identify configurations 
generated by one angle set and follow their cluster design- 
abilities as configurations from other angle sets are added. 
We take five different angle sets derived from fitting to 
1PSV, and use the most compact configurations generated 
by each set. We calculate the designability of structures by 



using configurations from, respectively, one, two, three, 
four, and finally all five sets. We observe in Figure 3(d) 
that the most designable structures in set #1 remain 
highly designable even as configurations from sets #2, #3, 
#4, and #5 are added. This result is maintained under 
permutation of the five sets. Apparently, any reasonable 
choice of angle set covers the structure space sufficiently 
well that highly designable structures can be identified 
with high probability. 
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Fig. 4. Maximum energy gap (red dots) and average energy gap 
(black dots) for the HP-sequences that design a given structure, plotted 
versus structure designability. The 1 0,000 most compact configurations of 
the 23-mer participate in the calculation, with X = 0.4 A and r p = 1 .9 A. 



HP Sequences 

To check whether the identification of designable struc- 
tures depends on our use of HP (binary) sequences of 
amino acids, we recalculate designabilities by using amino 
acids with continuous real-valued hydrophobicities. We 
randomly choose 4,000,000 sequences h - (h ly . . . , h N ) t 
where /i, 6 [0,1], and evaluate their energy for all configu- 
rations using Eq. (1). In Figure 3(e) we plot the designabil- 
ity calculated this way against that from the enumeration 
of HP sequences. As the figure shows, the highly design- 
able structures computed by these two alternative meth- 
ods are nearly identical. 

Parameter Independence 

In the preceding five subsections we have shown that the 
parameters can sustain a considerable degree of variation 
without significantly changing the outcome of the design- 
ability calculation. The weak dependence of the set of 
highly designable structures on parameters is illustrated 
in Figure 3. Because the identity of the highly designable 
structures is robust to parameter variation, we now exam- 
ine their potential as candidates for design. 

Gap 

In particular, a prerequisite for design is believed to be 
the presence of a large separation between the ground- 
state energy and the energy of the lowest excited state. For 
each structure, we have identified the HP-sequence that 
makes this gap the largest. The value of this largest gap is 
shown in Figure 4, as a function of the designability of the 
structure. To convert the vertical scale of Figure 4 to real 
energies, we observe that one unit of energy corresponds to 
a sequence of exclusively hydrophobic amino acids (h t = 1) 
folded into one of our typical compact structures. Our 



choice of surface area cutoff A c guarantees that a typical 
compact configuration has around half of its maximal 
accessible surface exposed (about 25 A 2 per residue). A 
conservative estimate for the energy of exposed surface, 23 
20 cal/A 2 /mol, then yields an energy on the order of 10 
kcal/mol for a 23-mer. The highest gap energies achieved 
in Figure 4, of order 0.05, therefore correspond to a gap of 
0.5 kcal/mol, around kpTfor room temperature. This gap is 
roughly the energy to promote one hydrophobic amino acid 
from core to surface. Also plotted is the average gap for all 
HP-sequences that design a structure. It is evident that 
high designability correlates strongly with a large gap. 

DISCUSSION AND CONCLUSION 

The principle of designability is that some structures are 
intrinsically easier to design than others. However, up to 
now, designability has been shown only in highly restric- 
tive lattice models. Our calculations indicate that the 
qualitative features of designability in lattice models are 
also exhibited off-lattice. Namely, a small minority of 
off-lattice structures are distinguished by high designabil- 
ity: these structures are lowest-energy states for many 
more than their share of sequences. Moreover, the se- 
quences associated with these structures have enhanced 
thermodynamic stability. The work presented here, using 
an off-lattice model for protein-backbone configurations, 
makes it more plausible that designability applies to real 
proteins. Of course, the model used in the current study is 
highly simplified — it is a low-resolution discrete model of 
short chain with a very simple potential function. There is 
still a long way to go to show the designability principle in 
real proteins. 

Nonetheless, the insensitivity to model parameters of 
the results presented suggests that our highly designable 
structures are possible candidates for real protein design. 
It is therefore worthwhile to study some of our best 
candidates in detail and to understand what architectural 
properties distinguish the most designable structures from 
the least designable ones and how the most designable 
ones compare with known natural structures. 

Representative configurations of some of the most design- 
able structures are shown in Figure l(a-c). A striking 
characteristic of the highly designable structures is that 
each has a well-defined core consisting of a small subset of 
the amino acids of the chain. For example, in Figure 5 we 
have plotted the inaccessible surface area of each amino 
acid along the chain for the configuration appearing in 
Figure 1(b). Observe that 5 of the 23 amino acids are more 
than 70% buried. Also shown in Figure 5 is the probability 
that a hydrophobic amino acid occupies a particular site, 
averaged over all HP-sequences that design the structure, 
revealing the preference of hydrophobic amino acids for 
the core. 

A quantitative measure of the core in a structure is the 
variance v s of the exposure vector A: v s = (177V) 2; df - 
(1/JV 2 ) didi) 2 . In Figure 6, we plot v s versus the designabil- 
ity N 0 . On average the two quantities correlate well; 
however, the scatter of the data is large in the region of low 
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Fig. 5. Solid bars: Inaccessible surface for residues (C p spheres) of 
the highly designable configuration shown in Figure 1(b). Hollow bars: 
Probability, averaged over all HP-sequences that design the configura- 
tion, that a particular site along the chain is occupied by a hydrophobic 
amino acid. 
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Fig. 6. The average variance v s of a cluster against the designability 
N s of the cluster for the 23-mer. The 5000 most compact configurations 
participate in the calculation, \ = 0.4 A, and r p = 1 .9 A. Gray line: running 
average with bin size 30. 

N s : structures with well-formed cores are not necessarily 
highly designable. 

A zinc finger-like fold emerges from our calculation as 
one of the most designable structures. The fold [Fig. 1(6)] 
does not simply replicate 1PSV [Fig. 1(d)], on which we 
optimized our angle set. The structure of 1PSV is too open 
to be designable within our model because the small, 
uniformly sized side-chains cannot fill the large opening 
between the a-helix and the fJ-fJ turn in 1PSV. It is of 
interest that the model produces a highly designable 
solution by collapsing the a-helix onto the turn. 

Another of our most designable structures is similar to 
another small natural fold, the helix-turn-helix [see Fig. 
1(c)]. Some of our most designable structures [e.g., that 
shown in Fig. 1(a)] do not resemble any known natural 




Fig. 7. a: Backbone configuration of the 1 1 th most designable 23-mer 
structure, using untargeted angle set (see text): (<|>, it) = (-55°, t 135°), 
(-126°, 145°), and (-85°, -25°), with a mean crms of 3.6 A on a 
representative subset of natural structures segmented into subchains of 
21 amino acids. For this calculation, the amino acids are represented by 
spheres of radius r a = 1.52 A centered on the C a carbons only, b: 
Backbone configuration of the zinc finger 1 NC8, truncated to 23 amino 
acids. 25 



folds. These structures are candidates for the design of 
truly novel folds. 

Targeting a fold by fitting the angle set to a single chosen 
structure is not essential. For example, we can obtain a 
suitable angle set by choosing two pairs of dihedral angles 
(4>, *|/) within the 0-sheet region and one pair from the 
a-helix region, locally optimizing on 160 representative 
natural structures from the PDB database. 21 Among the 
most designable structures emerging for this angle set is 
the zinc finger-like structure in Figure 7(a), shown next to 
its apparent natural counterpart, 1NC8 [Fig. 7(b)]. 25 

Recently, many studies have been conducted on the 
relation between the folding kinetics and the topology of 
native states. 26 " 36 In particular, it has been shown that 
folding rates and the topology of the transition states are 
closely related to the topology of the native states. In other 
words, the native state topology, which in this context is 
often measured in terms of contact order, 26 ' 35 largely 
determines how a protein folds. It would be interesting to 
compare the two roles the native state topology plays: in 
folding kinetics and in the designability and thermody- 
namic stability. However, such a comparative study would 
preferably be done in systems of longer chains than used in 
the current study. Although it is tempting to think that 
there is a deep connection between the two roles of 
topology, one should note that there is a huge variation in 
folding rates among natural proteins, 33 which are presum- 
ably highly designable and thermodynamically stable. It 
appears that designability is largely governed by the 
surface-core patterning, 15 whereas folding kinetics de- 
pends more on the ease of forming native contacts (the 
contact order). 

In summary, we have computed the designabilities of 
structures within an off-lattice model of realistic protein- 
backbone configurations. Highly designable structures 
emerge with remarkable insensitivity to model parame- 
ters. The sequences that design these structures have 
strongly enhanced mutational stability and a large energy 
gap between the native fold and the lowest non-native 
conformation. In this light, it is interesting that recent 
mutation studies on some small proteins show that they 
maintain their native folds even when about half of their 
residues are replaced by alanine. 37,38 Some of our highly 
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designable structures correspond closely to natural folds, 
such as the zinc finger and helix-turn-helix motifs. Others 
do not resemble existing structures and are candidates for 
ab initio design of novel protein folds. 
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